Search CORE

5,989 research outputs found

Weight Prediction Boosts the Convergence of AdamW

Author: Guan Lei
Publication venue
Publication date: 07/08/2023
Field of study

In this paper, we introduce weight prediction into the AdamW optimizer to boost its convergence when training the deep neural network (DNN) models. In particular, ahead of each mini-batch training, we predict the future weights according to the update rule of AdamW and then apply the predicted future weights to do both forward pass and backward propagation. In this way, the AdamW optimizer always utilizes the gradients w.r.t. the future weights instead of current weights to update the DNN parameters, making the AdamW optimizer achieve better convergence. Our proposal is simple and straightforward to implement but effective in boosting the convergence of DNN training. We performed extensive experimental evaluations on image classification and language modeling tasks to verify the effectiveness of our proposal. The experimental results validate that our proposal can boost the convergence of AdamW and achieve better accuracy than AdamW when training the DNN models

arXiv.org e-Print Archive

AdaPlus: Integrating Nesterov Momentum and Precise Stepsize Adjustment on AdamW Basis

Author: Guan Lei
Publication venue
Publication date: 05/09/2023
Field of study

This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus. The experiment results validate that AdaPlus (i) is the best adaptive method which performs most comparable with (even slightly better than) SGD with momentum on image classification tasks and (ii) outperforms other state-of-the-art optimizers on language modeling tasks and illustrates the highest stability when training GANs. The experiment code of AdaPlus is available at: https://github.com/guanleics/AdaPlus

arXiv.org e-Print Archive

Capacity-Achieving Iterative LMMSE Detection for MIMO-NOMA Systems

Author: Guan Yong Liang
Li Ying
Liu Lei
Yuen Chau
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2016
Field of study

This paper considers a iterative Linear Minimum Mean Square Error (LMMSE) detection for the uplink Multiuser Multiple-Input and Multiple-Output (MU-MIMO) systems with Non-Orthogonal Multiple Access (NOMA). The iterative LMMSE detection greatly reduces the system computational complexity by departing the overall processing into many low-complexity distributed calculations. However, it is generally considered to be sub-optimal and achieves relatively poor performance. In this paper, we firstly present the matching conditions and area theorems for the iterative detection of the MIMO-NOMA systems. Based on the proposed matching conditions and area theorems, the achievable rate region of the iterative LMMSE detection is analysed. We prove that by properly design the iterative LMMSE detection, it can achieve (i) the optimal sum capacity of MU-MIMO systems, (ii) all the maximal extreme points in the capacity region of MU-MIMO system, and (iii) the whole capacity region of two-user MIMO systems.Comment: 6pages, 5 figures, accepted by IEEE ICC 2016, 23-27 May 2016, Kuala Lumpur, Malaysi

arXiv.org e-Print Archive

Crossref

Research on the Improvement of Calculation Method for the Interference Assembly of Locomotive Traction Gear

Author: Guan Tianmin
Lei Lei
Qin Meichao
Publication venue: 'Periodica Polytechnica Budapest University of Technology and Economics'
Publication date: 10/09/2020
Field of study

The interference assembly is the main method for the connection between the traction gear and the shaft. The selection of the interference plays a critical role in the design of the traction gear. The traditional method of the calculation of the interference of the traction gear oversimply the mathematical model. The error goes out of the acceptable range so that the old method is not suitable for the design of the web structure. In this paper we propose an improved algorithm for solving the interference of the traction gear by combining the classical elastic mechanics theory and the finite element segmentation technique. The results from our improved algorithm is compared with that from the traditional method and the finite element simulation data is compared with the experimental results. Both comparisons verified the rationality and the feasibility of our algorithm. Our research provides the theoretical reference significance and practical guiding value for the selection of the range of interference

Periodica Polytechnica (Budapest University of Technology and Economics)